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ABSTRACT 


Performance prediction has emerged as one of the most pop- 
ular approaches to leverage large volume of online learning 
data. In the majority of current works, performance pre- 
diction is based on students’ past activities in graded learn- 
ing resources (such as problems and quizzes), while their 
activities in non-graded resources (such as reading mate- 
rial) are ignored. In this paper, we introduce an approach 
that can take advantage of students’ work with non-graded 
learning resources, as auxiliary data, in order to predict stu- 
dents’ performance in graded resources. This approach can 
discover the hidden inter-relationships between learning re- 
sources of different types, only using student activity data. 
Based on our experiments, the proposed approach can signif- 
icantly reduce the error of student performance prediction, 
compared to baseline algorithms, while discovering meaning- 
ful and surprising relationships among learning resources. 
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1. INTRODUCTION AND RELATED WORK 


The learning data abundance, due to popularity of Massive 
Open Online Courses (MOOCs), introduces new opportuni- 
ties and challenges for the educational data mining (EDM) 
field. On one hand, larger volumes of student data can 
increase performance of traditional EDM approaches. For 
example, a performance prediction approach that is popu- 
lar in the area of intelligent tutoring systems, offers a good 
basis for learning personalization. If the data-driven per- 
formance model predicts that some problem will be solved 
by the current student with a high probability, this prob- 
lem could be skipped in favor of a more challenging one. 
If the expected performance is low, students could be of- 
fered some help and supplementary material. MOOC-scale 
data can help improving performance prediction making this 
approach more usable. On the other hand, data coming 
from modern MOOCs is usually more heterogeneous and 
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too complicated for traditional EDM approaches. Unlike 
conventional Intelligent Tutoring Systems (ITS), that are 
mostly based on problem-solving, MOOCs offer students to 
learn and assess their knowledge using a variety of learning 
resources, such as reading materials, lecture videos, assign- 
ments, exams, graded quizzes, and discussions. This leads to 
various types of learning activities for students. With that 
heterogeneity, come interesting challenges: how to use infor- 
mation about student work with diverse learning resources 
to assess student knowledge or predict student performance? 
what is the relationship between concepts that are offered 
in different learning resource types? 


A number of research projects, focused on alternative learn- 
ing resources, demonstrated that many kinds of resources 
could considerably contribute to student learning. For ex- 
ample, Najar et al. studied effect of adaptive worked exam- 
ples versus unsupported problem solving and showed that 
adaptive worked examples can lead to faster and more effec- 
tive learning [Najar et al. 2014]. Also, Agrawal et al. showed 
that enriching textbooks with additional forms of content, 
such as images and videos, increases the helpfulness of learn- 
ing material [Agrawal et al. 2014]. This indicates that ig- 
noring the interaction between various types of resources 
limits our understanding of students’ learning behavior and 
the efficiency of mining and analytical tasks, such as stu- 
dent knowledge modeling or performance prediction. Addi- 
tionally, understanding inter-relationships between different 
resource types and student activities can help instructors 
in having more well-informed decisions on their course de- 
sign. Modeling such inter-relationships in students’ data can 
provide a unified view to data heterogeneity and present a 
better understanding of student learning, by modeling these 
different resource types that present student activities. 


While there are some studies in the literature on impact 
of various learning resources on learning, the relationship 
between learning resource types and their effect on predict- 
ing student performance is under-investigated. For example, 
Wen and Rosé studied student patterns across different ac- 
tivity types and concluded that these patterns can provide 
insights into different activity distributions between high- 


grade and low-grade students |Wen and Rosé 2014}. How- 


ever, their goal was not to predict student grades from their 
activities. Velasquez et al. identi- 
fied learning aid use patterns using cluster analysis. They 
showed that high use of learning aids is significantly corre- 
lated with students’ exam performance. But, they did not 
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predict student performance. Sao Pedro et al. 
extended Bayesian Knowledge Tracing by condi- 
tioning the learning on whether the students received scaf- 
folding in a topic or not. This model uses extra context infor- 
mation (topics) in addition to student performance, does not 
discover the relationship between learning resource types, 
and does not distinguish between different learning resources. 
Jifi and Pelanek studied learning resource similarities 
[and Pelanek 2017], but it was on graded resources, not con- 
sidering resource types, and not predicting student perfor- 
mance. 


One reason for unpopularity of using heterogeneous resources 
for predicting student performance is their potential conflict- 
ing effects. For example, Beck et al. investigated if provid- 
ing assistance (help) to students benefits them using experi- 
mental trials, Bayesian Evaluation and Assessment frame- 
work, and learning decomposition [Beck et al. 2008). In 
their studies, experimental trials and learning decomposi- 
tion showed that assistance hurts students’ learning. How- 
ever, the Bayesian Evaluation and Assessment framework 
found that assistance promoted students’ long-term learn- 
ing. More recently, Huang et al. discovered that adaptation 
of their framework (FAST) for student modeling by includ- 
ing various activity types may lead researchers to contra- 
dictory conclusions [Huang et al. 2015]. More specifically, 
they studied the impact of example usage on student learn- 
ing. In one of their formulations student example activity 
suggests a positive association with model parameters, such 
as probability of learning, while in another formulation this 
type of activity has a negative association with model pa- 
rameters. Also, Hosseini et al. concluded that annotated 
examples show a negative relationship with students’ learn- 
ing, because of a selection effect: while annotated students 
may help students to learn, weaker students may study more 


annotated examples |Hosseini et al. 2016). 


Another complication for considering heterogeneous resources 
is the difficulty in interpreting students’ observed activities. 
In graded resource types, such as assignments and quizzes, 
a student’s score explicitly represents her knowledge on the 
topic. Whereas in other resource types, such as reading ma- 
terial, there is no direct evaluation or explicit observation 
of student’s knowledge. Hence, measuring the effect of such 
learning resources on students’ knowledge, and thus predict- 
ing their future performance, would be a challenging task. 


In this paper we propose an approach motivated by canon- 
ical correlation analysis (CCA) to discover the interaction 
between different learning resource types, using student ac- 
tivities, and to predict student performance on different 
learning resources. Our proposed approach can uncover la- 
tent relationships among subsets of learning recourses from 
different types and can quantify these relationships. Our 
analysis on two real-world datasets demonstrates that the 
discovered relationships are meaningful and can be used for 
course design and adaptive learning purposes. Addition- 
ally, the proposed approach can use student interactions 
with one auxiliary learning resource (such as examples) to 
predict students performance on another target learning re- 
source type (such as problems). Our experiments on four 
real-world datasets show that our approach can efficiently 
use the extra information provided by auxiliary learning re- 


sources and significantly improve the student performance 
prediction error over the baseline models. 


2. THE APPROACH 


Our proposed approach is inspired by Canonical Correla- 
tion Analysis (CCA) [Hotelling 1936], which is a multi- 
variate statistical model that studies the interrelationships 
among sets of multiple dependent and independent vari- 
ables. CCA’s goal is to find linear projections of these vari- 
able sets into a shared latent space such that the correlation 
between these projections are maximized. In this research, 
we use CCA as our main tool: we propose to find the rela- 
tionship between students’ ungraded activities (as indepen- 
dent variables) and students’ graded activities (as dependent 
variables) using CCA. Our final goal is to propose a model 
for predicting student performance using pairs of resource 
types, motivated by the discovered relationships. 


Our reason for choosing CCA as inspiration is twofold. First, 
CCA provides different views to the same data samples. 
Since we have the same students interacting with multiple 
resource types (e.g., examples and problems), we need to 
have a tool to model these interactions at the same time, 
while distinguishing between distinct resource types (as dif- 
ferent views). Other factor analysis models, such as Princi- 
pal Component Analysis (PCA), operate on one single view 
of the data and are not appropriate for our problem. Second, 
because of having multiple learning resources within each re- 
source type (e.g., multiple problems and multiple examples) 
and several students (as datapoints) we need a multi-variate 
statistical model to capture the two-dimensional variability 
in the data. Bivariate or simpler multivariate models such as 
correlation or regression analysis can only capture the data 
variance for one dependent variable at a time and thus miss 
the variability of either students or learning material. We 
first give a brief background on CCA and then explain how 
to model and solve our problems using it. 


CCA. If matrix Xmxn represents n data samples and m 
variables and matrix Ypxn contains the values for p variables 
of same n data samples, CCA aims to find linear transfor- 
mations, wz and w,, such that the correlation between pro- 
jections of X and Y through wz and wy (reflected as p in 
Equation |I) is maximized. 


wexY Tw, 


VE XXT ws) (whYYT wy) () 


p= 


Since multiplication of w, and wy by a constant does not 
change the value of p in Equation [1] the problem of finding 
We and w, can be formulated as in Equation [2] 

max w, XY" wy 

i (2) 

subject to wixXT we = 1,wy YY" wy =I] 


Adding the regularization parameters to Equation[2| for con- 
trolling over-fitting of p, Sun et al. show that this regularized- 


CCA problem can be represented as in Equation and 

solved using a least squares approach |Sun et al. 2008). The 

formulation for wy is a symmetrical version of Equation [3] 
XYT(YY7T)1Y XT we = (XX + AD we (3) 


In addition to wz and w, that produce the maximum corre- 
lation p, there can be other projection vector pairs that can 
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map X and Y matrices with correlations less than or equal 
to p. The optimization problem in Equation [4] finds these 
multiple projection vectors for X (in matrix W,). 


max Trace(W; XY" (YY") 'Y X7W,) 
is (4) 
subject to Wz XX*W, =I 


2.1 Relation Discovery between Learning Re- 


source Types 

As students work with various learning resources that are 
provided in an online course or a tutoring system, they gain 
more knowledge about the concepts presented in the course 
and can tackle more complicated problems. Knowing the 
relationship between different learning resource types and 
the way they interact in affecting students’ knowledge can 
help better course design.Having the learning material from 
one resource type (e.g., problems) as one set of variables and 
learning material from another type (e.g., examples) as the 
other set of variables, we can interpret canonical correlation 
as a measure of relatedness between resource types. 


More specifically, to map our problem to the CCA setting, 
we suppose that there are n students that have at least one 
activity in each of the resource types. For example, these 
students may have tried some problems and studied some 
examples in the course. We represent the students’ perfor- 
mance on problems as a matrix Ypxn, with n students, p 
problems, and Y;,; representing the student 7’s score in quiz 
i. This score can be a grade or pass/fail indicator. Similarly, 
students’ example activities can be represented as another 
matrix Xmxn, with n students, m examples, and X;,; as an 
indication that user 7 has read example 7. Given these two 
activity matrices, we use CCA to find linear transformations 
W, and W, and canonical correlations P as in Equation [4] 


Formulating our problem as an instance of CCA, W, and 
W, can represent linear transformation matrices that map 
the original activity matrices X and Y into a shared latent 
space. These projections are scaled based on the canoni- 
cal correlation values in a diagonal matrix P..~, in which 
each of the diagonal elements are equivalent to the canoni- 
cal correlation value p; for each projection vector pair We, F 
and Wy, ;. Meanwhile, the projection matrices Wz,,,, and 
Wy,.- are representations of learning resources, projected 
into the shared space. Having this shared component space, 
we can compare and relate activities that are present in the 
two resource types. 


In other words, each learning material i from the auxiliary 
learning resource in matrix X, will be represented as a 1 xc 
vector W,,, and each learning material j from the target 
learning resource in matrix Y, will be represented as a 1 x c 
vector Wy, .. So, we can find the most similar resources from 
different types by looking at the cosine similarity between 
those vectors in the shared component space. 


Note that this is different from simply comparing matrices X 
and Y in the shared student space by calculating their cosine 
similarity. Here, we have the canonical correlation effect on 
finding similar learning resources. To be more clear, if we 
suppose that wz XX7 wz, = 1 and wi YY wy = 1 (by which 
we transformed Equation [I] to Equation 2), then we have: 


p=w, XY wy (5) 


p in Equation |5| is equivalent to p in Equation scaled 
by its denominator. Now, if we left-multiply both sides of 
Equation |5] by wr, and right-multiply both sides of it by 
wy, we achieve XYT = w ia pw, *. Equivalently, when 
having multiple canonical correlations, we can see that: 


XY? =W!'Pw,7! (6) 


Equation [6] shows the relationship between the projection 
matrices with the cosine similarity of X and Y (XY7). 
Clearly, YX? and Ww,wl are not equal. 


2.2 Inter-Activity Performance Prediction 
Predicting how a student performs on a problem can help 
teachers to adjust the course material based on students’ 
predicted performance and can lead to personalized learn- 
ing. Also, it can guide students towards a structured and 
effective learning. As in many prediction problems, educa- 
tional data is usually incomplete: not all students try all 
resources. We focus on predicting students’ scores for the 
first time that they try a problem. Thus, the problem of 
predicting students’ performance can be interpreted as es- 
timating the missing values in the student activity matrix 
(Y) that is described in the beginning of Section [2] 


As proposed in Section[2.1] we can find the relationship be- 
tween sets of learning resources of two types using CCA. 
Thus, if we know students’ performance on auxiliary learn- 
ing resources in matrix X and their performance in the tar- 
get learning resource in matrix Y, we can understand how 
students’ activities on auxiliary learning resources affect the 
same students’ performance on the target learning resources. 
When the student activity matrix (Y) is incomplete, we can 
estimate wz and wy, by calculating the canonical correlations 
between the auxiliary activity matrix X and the incomplete 
target activity matrix Y to achieve the estimated projection 
vectors wy and wy. Using these projection vectors, we can 
estimate a complete activity matrix Y as in Equation [7] 

Y = wypw,’ X (7) 
Here, student activities in the auxiliary learning resource are 
mapped to the shared latent space, scaled by the canonical 
correlation factor p, and then mapped back to the target 
learning resource space. In case of calculating multiple (c) 
projection vector pairs (Wz,,,., and eee with canoni- 
cal correlations represented in ch we estimate students’ 


performance (Y) as in Equation 8 
Y =W,PW, Xx (8) 


3. DATASETS 


We use four datasets from two online platforms for our 
experiments. The anonymized data represent log files of 
student interaction with course resources (activities), and 
their performance in them. Each of these platforms allow 
their students to learn from multiple learning resource types 
that calls for modeling inter-activity relations. The first two 
datasets are richer since they have learning resource names, 
topics, and contents although we do not use them for the 
discovery and prediction purposes. The third and fourth 
datasets are larger, from a MOOC platform, with more vari- 
ation in learning resource types. However, we do not have 
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access to these learning resources beyond their assigned IDs. 
In the following sections, we describe each of these datasets. 


Table 1: Statistics of Mastery Grids datasets 


epqdeute: | (peaks Parsons | annot. | anim. 
prob. exam. exam. 
number 319 37 43 58 53 
Python | average 
activity 65.5 147.5 112.3 97.2 93.8 
records 
number 206 113 - 101 50 
Java average 
activity 127.2 108.3 - 93.9 89.7 
records 
density 0.78 0.53 = 0.47 0.44 


Table 2: Statistics of Canvas Network datasets 


students | @U* assign. discus. 
assign. topics 
Business and number 232 32 38 34 
Management oN ee 
activity 62.7 208.1 190.8 18.9 
records 
density 0.60 0.89 0.82 0.08 
Proencne aad number 1160 18 26 70 
; f average 
Applied Sciences | ccwiy || 2645! | are | Bye | Sia 
records 
density 0.14 0.37 0.32 0.02 


3.1 Mastery Grids Datasets 


Our first two datasets are collected from an online intelli- 
gent tutoring system, Mastery Grids [Loboda et al. 2014]. 
This system provides personalized access to three types of 
interactive content for Java programming and four types of 
content for Python programming. Parameterized semantic 
problems, annotated examples (code snippets with explana- 
tions), and animated examples (interactive simulations that 
visually demonstrate the runtime behavior of a code snip- 
pet) are the three types of resources that are available for 
both Java and Python courses. In addition to those, Python 
course includes the so-called Parsons problems originally in- 


troduced in |Parsons and Haden 2006}. 


The parameterized semantic problems (problems, for short) 
are generated by QuizJet and QuizPet system [Hsiao et al.] 
from a pool of parameterized questions on Java and 
Python programming. As a result, the same problem can be 
attempted multiple times by students with various parame- 
ters. We only consider students’ first attempt on each prob- 
lem for our experiments. Annotated examples presented 
by WebEx allow students to interactively explore line-by- 


line explanation of code snippets |Brusilovsky and Yudelson 


2008}. Working with animated examples, which are gener- 
ated using Jsvee library [Sirkia 2016], students can execute a 
Java or Python program visually, observing internal opera- 
tion, such as variable assignments and printing on a console. 
In Parsons problems, students are asked to solve a program- 
ming task by selecting and sorting provided code lines. 


Mastery Grids groups different learning resources into mul- 
tiple learning topics. Although this system offers a recom- 
mended topic sequence in its interface, the students are free 


to select and work on any of the topics and learning resources 
at any given time. The Java dataset from this system is 
collected from Fall and Spring semesters of 2016. Among 
all of the students, we selected the ones who have at least 
one activity in each of the problems, annotated examples, 
and animated examples. A summary of statistics for these 
datasets are shown in Table [1] The Python dataset about 
two times sparser than the Java dataset in terms of num- 
ber of all activities per student. Among different resource 
types, the density of student activities on problems are the 
closest between the two datasets. In both of the datasets, 
student activities on problems are the densest and activities 
on animated examples are the most sparse. 


3.2 Canvas Network Datasets 

Our third and fourth datasets are publicly available from 
Canvas Network [Network 2016). Can- 
vas Network hosts many freely available open online courses 
in which it offers multiple leaning resource types. More 
specifically, in addition to learning modules, each course can 
have different types of assignments, discussions, and pop- 
quizzes. Participants are not limited to a specific sequence of 
learning material or assignments. Categories of the learning 
resources include “assignments”, “quiz-assignments”, “pop- 
quizzes”, “discussions”, and “wikis”. The dataset is anonymized 
such that student IDs, course names, discussion contents, 
submission contents, and course contents are not available. 


Course assignments can be quiz-style (“quiz-assignment”) or 
in longer format, for which students submit a text or video 
file (“assignments”). We choose two of the offered courses 
in Canvas Network as the third and fourth datasets for our 
experiments. These two courses are selected because they 
provide multiple learning resource types and have more ac- 
tive students in all of these resource types. The first course 
is in the “Professions and Applied Sciences” field and the 
second course is in the “Business and Management” field. 


Since assignments, quiz-assignments, and discussions have 
the most activities, we focus on these resource types in our 
experiments. Among these three, assignments and quiz- 
assignments are graded. For consistency, we normalize stu- 
dents’ grades between zero and one based on their maximum 
possible grade. For discussions, we consider a binary vari- 
able representing if a student has posted a message or not. 
We select the students who have at least one activity in each 
of these learning resources. A summary of statistics for these 
datasets is shown in Table[2] Discussion topics have the least 
dense activity matrices in the two datasets. They are very 
sparse compared to student activities on assignments and 
quiz-assignments. Comparing the two datasets from Canvas 
Network, overall student activities in professional and ap- 
plied sciences domain course is very sparse. But, the density 
of student activities on all resources in business and manage- 
ment domain course is comparable with the datasets from 
Mastery Grids system. However, the distribution of student 
activities among various resource types are more skewed in 
the Canvas Network datasets. 


4. EXPERIMENTS 

4.1 Experiment Setup 

Per the proposed model in Section[2| element X;,; in activity 
matrix X represents the result of student j’s first attempt 
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on learning resource 7. This activity result can be different 
for different learning resource types. For graded learning re- 
sources, such as assignments and quiz-assignments, we use 
the normalized score of students; for problems and Parsons 
problems with success or failure feedback, we use binary 
scores; and for non-graded activities, such as reading an an- 
notated example or posting in a discussion forum, we use a 
binary indicator that shows the students’ attempt. We use 
average imputation for missing values. 


For prediction experiments, we follow a 5-fold user strati- 
fied separation of the student performance data to perform 
cross-validation on it. Particularly, in each round of exper- 
iments, we select 20% of students as test students, 15% of 
them for validation purposes, and 65% of them as train. Our 
task is to predict test students’ performance on activities in 
a target learning resource type, observing 20% of these stu- 
dents’ activities, and the training data. In the CCA-based 
proposed approach, the training data includes all students’ 
activities in the auxiliary learning resource type, in addition 
to observed activities of students in the target resources. We 
repeat each round of the experiments for 5 times. 


Since only quiz-assignments and assignments are graded in 
the Canvas Network datasets, and only problems and Par- 
sons problems are graded in the Mastery Grids datasets, we 
define the prediction tasks on these resource types. Discus- 
sions from the Canvas Network datasets and examples (an- 
notated and animated) from the Mastery Grids datasets are 
only used as auxiliary resources. Note that each of graded re- 
source types (quiz-assignments, assignments, problems, and 
Parsons problems) can also be used as an auxiliary resource 
for another type of graded resource in the same dataset. 


Baselines. In previous works, collaborative filtering meth- 
ods have been proved successful in predicting students per- 
formance Sahebi et al. 2014). Sine 
our proposed approach is similar to these approaches in 
discovering latent relationships among learning resources, 
through factorizing activity matrices, we choose two settings 
of SVD+-+ algorithm as our baselines. 
To study if adding student activities in auxiliary resource 
type would help better estimation of students performance 
in the target resource type, we compare our approach with 
single-resource SVD++ algorithm. In this setting we run 
SVD++4 algorithm only on the target learning resource ma- 
trix, assuming that we do not have the information on stu- 
dent activities in the auxiliary resource types, and compare 
the results with our proposed method. To understand our 
CCA-based method’s efficiency on capturing important rela- 
tionships between different learning resource types, we com- 
pare it with a paired-resource setting of SVD++ algorithm. 
Particularly, we merge the two auxiliary and target learning 
resource types into one set of learning materials (represented 
by one matrix) and run the SVD++ algorithm on this aug- 
mented matrix. Note that our proposed method factorized 
two separate matrices at the same time but SVD++ can 
only factorize one matrix. 


Since the student activity datasets are biased towards stu- 
dent success (e.g., average grade for problems in the Python 
dataset is 0.67 out of 1), we compare the methods with an 
average baseline. To do this, we use the training dataset 


average as the predicted performance for all of the students 
in each of the 5 data splits. 


4.2 Discovering Relationships between Learn- 


ing Resource Types 

One of our goals in this paper is to understand relationships 
and interactions between sets of learning resources with var- 
ious types. CCA has the ability to represent each pair of 
learning resource types in the same latent space. This en- 
ables us to relate learning material of different types only 
based on student activities, without relying on their content 
or presented concepts. Since the Mastery Grid datasets pro- 
vide learning resource names and topics we can confirm the 
discovered relationships by comparing them with learning 
resource topic similarities. These topics have been manu- 
ally assigned to learning resources by experts, during course 
design in Mastery Grids. In order to take a deeper look at 
the discovered similarities, we study the top similar learning 
resources of different types in the same course (as shown in 
Table [3p. To calculate these similarities, we look at projec- 
tions of each learning resource in the shared latent space, Wz 
and W, and calculate the cosine similarity between them, as 
mentioned in Section[2.1] We look at the most similar learn- 
ing resources of each course in the following. 


The Java Dataset. For the Java dataset, we can calcu- 
late the cosine similarity of problems with animated exam- 
ples and problems with annotated examples. We can see 
the most similar problems and animated examples in rows 
1-4 of Table As we can see, three of these four learn- 
ing resource pairs are from the same expert-labeled topic. 
For example, both problem “jWhile1” and animated example 
“ae _while_demo” are about “while loops” in Java. This shows 
that our approach can accurately figure out the most similar 
problems and animated examples, only based on student ac- 
tivities and their performance, not knowing about their topic 
or content. However, the resources in row 3 are from differ- 
ent expert-labeled topics “boolean expressions” and “switch”. 
While these two are not exactly the same, the switch expres- 
sions in Java use boolean expressions in their conditional 
statements. So these two topics are closely related to each 
other: if a student cannot understand the “boolean expres- 
sions” topic, understanding the “switch” topic would be dif- 
ficult for this student. 


The most similar Java annotated examples and problems, 
found by CCA projection matrices, are listed in rows 5-8. 
Here, we do not see the obvious similarities that was ap- 
parent between animated examples and problems. In row 
5, there is topic similarity between the problem with “loops 
do-while” topic and the annotated example with “loops for” 
topic: both of them are about loops in Java. For row 8, we 
know that Java for loops use “arithmetic operations” in their 
conditional statement. However, topics for similar resources 
discovered in rows 6 and 7 look irrelevant. Row 6’s prob- 
lem is labeled by experts with the “interfaces” topic, while 
the similar annotated example is labeled with the “variables” 
topic. Likewise, the problem topic in row 7 is “interfaces”, 
while the topic of similar annotated example is “objects”. 


To gain more insight about these learning resources, we 
looked at their contents. We discovered that although the 
general topics for these problems and their discovered anno- 
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Table 3: Most similar learning materials of different types, from Java and Python courses, according to their 
similarity using CCA projection vectors. 
material 
course row 
type 
prob. anim. 
prob. name prob. topic anim. exam. topic anim. exam. name exam. 
prob. ID ID 
sd r T 14 jArrayList5 ArrayList ArrayList ae_arraylist2_v2 3 
ale 2 18 jBoolean_Operators Boolean expressions Switch ae_switch_demo2 44 
Fava eee 3 65 jMathFuc2 Arithmetic operations Arithmetic operations ae_arithmetic_v2 1 
4 100 jWhile1 Loops while Loops while ae_while_demo 49 
b annot 
Bee: prob. name prob. topic annot. exam. topic annot. exam. name exam. 
prob. ID ID 
o 5 37 jDowhilel Loops do_while Loops for forl_v2 28 
GUnOr 6 57 jInterfaces1 Interfaces Variables Print Tester 78 
oer 7 61 jInterfaces5 Interfaces Objects AccessorMutatorDemo 1 
8 63 jMathCeil Arithmetic operations Loops for JavaTutorial_4_6_8 57 
vob annot 
prob. ae prob. name prob. topic annot. exam. topic annot. exam. name exam. 
& ID 
annot. 9 3 q_py_arithmeticl Variables Variables pyt1.3 5 
exam. 10 | 21 q_py_nested_if_elif1 if_statements values_references pytt10.25 58 
11 | 23 q_-py_obj_account1 classes_objects Lists pyt7.2 53 
anim. 
prob : ; ; 
prob. 1D prob. name prob. topic anim. exam. topic anim. exam. name — 
& 
anim. 12 [7 q_py_dict_access1 dictionary loops ae_ad]_while 39 
Python exam. 13° | 29 q_-py_output1 output_formatting variables ae_adlLarithmetics2 1 
14 | 10 q_-py_fun_car1 functions exceptions ae_adl_tryexcept2 34 
rob pars. 
prob ae prob. name prob. topic pars. prob. topic pars. prob. name prob. 
& ID 
pars. 15 | 10 q_py_fun_car1 functions exceptions ps_python_try_adding 38 
prob. 16 | 12 q_py-if_elif1 if.statements loops combo_python_while 9 
17 | 35 q_py_swap1 variables variables combo_swap 11 
pars annot 
pars prob. | pars. prob. name pars. prob. topic annot. exam. topic annot. exam. name exam. 
prob ID ID 
& 18 [1 combo_avg variables variables pyt2.1 32 
annot. 19 | 14 ps_python_addition variables variables pyt1.2 4 
exam. 20 | 41 ps_return_bigger_or_none functions functions pyt10.7 30 
pars anim. 
pars. prob. | pars. prob. name pars. prob. topic anim. exam. topic anim. exam. name exam. 
prob. ID ID 
& 21 71 combo_avg variables variables ae_python_assignment 40 
anim. 22 = |12 ps_hello variables variables ae_adLarithmetics2 1 
exam. 23 | 43 ps_simple_params functions functions ae_adl_returnvalue 29 
public class Tester { 3 . 5 z 
public static void main(String[] args) { the designers of Java course were interested in the mentioned 
Mechanism mechl = new Computer(2.0, 2.0, true); topics while designing these learning resources, we are dis- 


Mechanism mech2 = new Car("Honda", 2); 


Computer comp (Computer) mechl; 


System.out.println(comp.getProcessorSpeed()); 
System.out.println(comp.reportProblems()); 


System.out.println(((Car) mech2).getBrand()); 
System.out.println(mech2.reportProblems()); 
} 
} 


What is the output? 


Be careful of the whitespace(space, newline) in your answer. 


Figure 1: Content of problem with “Interfaces” topic 
(row 6 of Table [3) 


tated examples are not the same, they include very sim- 
ilar concepts. For example, Figure }1| shows the content 
for problem “jInterfaces1” (topic: “interfaces”), and Figure 
shows the content for annotated example “PrintTester” 
(topic: “variables”). As we can see, the concept of printing 
an output in the console is very important in both of these 
learning resources. Interestingly, it appears that although 
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covering other possible “latent topics” for them. Another 
factor in these newly-found relations can be the mixed rela- 
tionship of annotated examples with students performance. 
Hosseini et al. have studied the use and impact of annotated 
and animated examples in the same online tutoring system 
and concluded that students are likely to learn more from 
animated examples [Hosseini et al. 2016]. Particularly, they 
showed that although more views of animated examples is 
associated with a higher course grade, the number of views 
on annotated examples has a negative effect on it. A possible 
reason is the negative process of associating examples with 
poor knowledge: students with poor knowledge are more 
likely to study annotated examples. This association can 
potentially overcome the positive impact of learning from 
annotated examples and lead to a negative impact. Also, 
they show that animated examples provided better impact 
on problem solving success and post-test scores. 


The Python Dataset. We study 5 pairs of resource types 
and the cosine similarities between W,ys and W/s in the 
Python dataset: problems vs. animated examples, problems 
vs. annotated examples, Parsons problems vs. animated 
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public class PrintTester 

{ 
public static void main(String[] args) 
{ 


wl System.out.println(3 + 4); 


System.out.println ("Hello"); 


a System.out.println ("World!"); 
ze | System.out.print ("00"); 
7 | System.out.println(3 + 4); 


System. out.println ("Goodbye") ; 


Figure 2: Content of annotated example with “Vari- 
ables” topic (row 6 of Table [3) 


class Account: 
def init (self, deposit=0): 
self.balance = deposit 


def deposit(self, sum): 
self.balance += sum 


def withdraw(self, sum): 
self.balance -= sum 

def get_balance(self): 
return self.balance 


def main(): 
accounts = {} 
accounts[0] = 
accounts[1] = 


Account () 
Account (379) 


accounts[0].deposit(379) 

accounts[1].deposit (379) 

accounts[0).withdraw(379-50) 

accounts[1].withdraw(379-100) 

print(accounts[0].get_balance() + accounts[1].get_balance()) 
main() 


What is the output? 


Be careful of the whitespace(space,newline) in your answer 


Figure 3: Content of problem with “classes_objects” 
topic (row 11 of Table |3) 


examples, Parsons problems vs. annotated examples, and 
problems vs. Parsons problems. Samples of discovered sim- 
ilar learning resources are shown in Table [3] 


As shown in rows 9-11, the first problem and its matched 
annotated example have the same topic of “variables”. But, 
the next two pairs do not have a common topic. We study 
the content of these learning resources to understand the 
nature of their similarity. For example, if we look at row 11, 
we see that annotated example “pyt7.2” has topic of “lists”. 
Now if we look at problem “q_py_obj_account1” with topic of 
“classes_objects” in Figure [3] we can see that this problem 
uses lists (accounts variable) in it. We avoid showing the 
content for the pair in row 10 due to space limits. 


Rows 12-14 show similar animated examples and problems 
in the Python dataset. To show the similarities between 
concepts used in these animated examples and problems, we 
look at one pair: problem “q_py_fun_car1” with topic “func- 
tions” (Figure[4) and animated example “ae_adl_tryexcept2” 
with topic “exceptions” (Figure [5). We can see that there 


is a function call and a function definition in this animated 
example (Figure [5). Consequently, although this animated 
example is not designed to teach the “function” topic and de- 
spite of it being labeled with the “exceptions” topic only, the 
discovered similarities show the associations between stu- 
dents’ learning of functions and this animated example. 


The most similar problems and Parsons problems are shown 
in rows 15-17 of Table [3] Two of the top similar pairs are 
from the same (“variables”) or related (“if statements” and 
“loops”) topics. The resources in row 15 are from different 
topics: a “functions” problem and an “exceptions” Parsons 
problem. But, as can be seen in Figures[4]and|6]the Parsons 
problem includes a function definition. So, students can 
learn about functions while executing this animated example 
that is about exceptions. 


def fuel(gallons, gas, tank_size): 
gas = min(gallons + gas, tank_size) 
return gas 
gas = 50-42 
gallons = fuel(25, gas, 50) 
print(gallons) 


What is the output? 


Be careful of the whitespace(space,newline) in your answer. 


Figure 4: Content of problem with “functions” topic 
(rows 14 and 15 of Table|3) 


1 def average(a, b): 
sum = int(a) + int(b) 
return sum / 2 


def main(): 
try: 
avg = average(“1", “two") 
9 print(“Avg is:", avg) 
10 ~=6except ValueError: 


2 
3 
4 
5 
6 
7 
8 


11 print("Error occurred!”) 
12 
13 

=> 14 main() 


Figure 5: Content of animated example with “ex- 
ceptions” topic (row 14 of Table/3) 


Drag from here 
print("Can only add numbers together.") 
except TypeError: 
return a + b 
def add_two_numbers(a,b): 


try: 


New instance Get feedback 


Construct a function that adds two numbers together and handles non-numeric input. 


Figure 6: Content of Parsons problem with “Excep- 
tions” topic (row 15 of Table |3) 


Finally, as we can see in rows 18-23, analogous samples of 
Parsons problems vs annotated examples, and Parsons prob- 
lems vs animated examples are all from the same topics. 
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One may think that the discovered similarities are a result 
of topic arrangements in the course design and conclude that 
we can find these similar learning resources by only looking 
at the co-occurrence of student activities in two learning 
resource types, e.g., by calculating the cosine similarities 
between learning resources in the original student-space, or 
matrices X and Y. However, looking at some of the discov- 
ered similarities, such as the second row of Table[3| reassures 
us that our approach can find the relationships beyond their 
trivial co-occurrence. As we have mentioned, the “switch” 
and “boolean Expressions” topics are not the same, but are 
very related. In the Mastery Grids interface, these two top- 
ics are not placed right next to each other. But, another 
topic (“ifelse” topic) is placed between them. This means 
that the discovered similarity is not solely based on activity 
co-occurrence due to topic placement in Mastery Grids. 


To discover what we can gain from trivial co-occurrences, 
without using our proposed method, we looked at samples 
of the most similar learning resources, based on the cosine 
similarity between student activities in the original student 
space (similarity between matrices X and Y). In this case, 
the most similar discovered learning resource pairs are ei- 
ther placed closely in the same topic (and thus, may happen 
due to the students following the sequence imposed by learn- 
ing resource arrangements in the interface), or do not have 
any meaningful content-based relationship. For example, 
the most similar animated example that is discovered in stu- 
dent space for the “jBoolean_Operators” problem (problem 
in row 2 of Table|3) is labeled with the “primitive data types” 
topic, demonstrating “Double” and “Short” data types. 


To summarize, the discovered CCA-based similarities in both 
datasets are meaningful. Some of the related learning re- 
source pairs are from the same topics, others are related in 
the concepts or sub-topics that they present. In general, 
this is a very promising result, especially for applications in 
which the learning resource contents are difficult to analyze 
and compare. Discovering these similarities, instructors can 
rearrange their learning material in ways that most bene- 
fits students’ learning. Also, it can be used for multi-source 
knowledge modeling of students. Namely, we can model stu- 
dent knowledge in shared concepts between problems and 
animated examples and understand how a student’s abil- 
ity in a learning recourse type (e.g., to solve a problem) 
increases by trying another learning resource of a different 
type (e.g., a related animated example). 


4.3 Predicting Student Performance Using Aux- 


iliary Resource Types 
Using the formulation proposed in Section[2.2] our goal here 
is to predict students’ performance using auxiliary learn- 
ing resource types and compare it with similar baseline ap- 
proaches. We measure performance of the proposed and 


baseline approaches using Root Mean Squared Error (RMSE). 


This measure quantifies the average difference between ac- 
tual students’ score and their predicted performance. 


Mastery Grids Datasets For the Java programming dataset, 


we run two sets of experiments. The first set of experi- 
ments is on predicting students performance on problems, 
using their activities on annotated examples as auxiliary 
data (“annotated examples — problems”). In the second 


set of experiments, we use animated example activities as 
the auxiliary resource for predicting students performance 
on problems (“animated examples > problems”). As men- 
tioned before, we compare the results of our proposed ap- 
proach with single-resource SVD+-+ —only using student logs 
on problems— and paired-resource SVD+-+ —with the same 
input as our proposed approach-—. 


For the Python programming dataset, we run six sets of 
experiments. Having problems and Parsons problems as 
target learning resource types, we use annotated examples 
and animated examples as the auxiliary learning resources. 
Additionally, problems may help us in predicting students’ 
performance in Parsons problems, and vice versa. 


Table |4| shows the RMSE of CCA-based and baseline ap- 
proaches for these sets of experiments on both of Mastery 
Grids datasets. The numbers in parentheses report the 95- 
percentile confidence interval for the reported errors. As 
we can see here, our CCA-based approach performs signifi- 
cantly better than the baselines in all of the experiment se- 
tups in both datasets. As our proposed approach performs 
better than single-resource SVD+-+, we can conclude than 
adding the auxiliary data significantly improves student per- 
formance prediction. On the other hand, we can see that the 
proposed CCA-based approach works better than SVD++ 
in the multi-recourse setting using the same set of auxil- 
iary and target data. Therefore, we can conclude that our 
approach is a better fit for effectively using auxiliary data. 


Comparing the two settings for SVD++, in the Python 
dataset single-resource SVD+-+ performs as good as, or sig- 
nificantly better than paired-resource SVD++. Specifically, 
for combinations “animated examples + problems” and “an- 
notated examples — problems”, paired-resource SVD+-+ has 
a significantly higher error than single-resource SVD++-. 
This confirms our findings in Section|4.2]about smaller simi- 
larities between problems and examples in the Python dataset. 
As expected in biased datasets, we can see that average base- 
line is working very well. Comparing with paired-resource 
SVD++4-, its error is significantly lower in four of the exper- 
iments on the Python dataset. Single-resource SVD+-+ is 
significantly better than (in “animated examples — prob- 
lems”, “annotated examples — problems”, and “problems +> 
Parsons problems”) or similar to the average baseline. 


In contrast, in the Java dataset, the average baseline has 
slightly, but significantly, higher error than the proposed ap- 
proach and the other two baselines for “annotated examples 
— problems”. For “animated examples — problems”, the av- 
erage baseline has better predictions compared to the other 
two baselines. Also, paired-resource SVD++ works signif- 
icantly better than single-resource SVD++ for “annotated 
examples — problems”. This shows that paired-resource 
SVD-++4 is not consistent on different datasets, even if sim- 
ilar learning resource types are used, and to be able to take 
advantage of auxiliary information, a more advanced ap- 
proach, such as the proposed one, is needed. 


Canvas Network datasets. Canvas Network datasets give 
us the opportunity to test our approach on more varied 
data of MOOCs and in different domains. Notably, “Profes- 
sions and Applied Sciences” data has more users and is very 
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Table 4: RMSE for student performance prediction task on Mastery Grids datasets. 


anim. example 
— problem 


annot. example 
— problem 


pars. prob. 
— problem 


anim. example 
— pars. prob. 


annot. example 
— pars. prob. 


prob. 
— pars. prob. 


resource 


0.516 (0.0124) 


0.5122 (0.0156) 


0.5524 (0.0083) 


0.5213 (0.022) 


0.456 (0.0084) 


paired- 

resource | 0.4148 (0.0097) | 0.4159 (0.0057) - - - - 

CCA 

Java . 

paired- 

resource 0.5304 (0.0127) 0.4696 (0.0047) - - - = 

SVD 

single- 

resource 0.5178 (0.0214) 0.4537 (0.0119) - - - - 

SVD 

haceline | 0:4859 (0.0071) | 0.4854 (0.0039) - - - - 
aseline 

paired- 

resource | 0.4584 (0.0035) | 0.4566 (0.0024) | 0.4579 (0.007) | 0.4122 (0.0081) | 0.4098 (0.0043) | 0.4105 (0.0075) 

CCA 

Python 


paired- 


0.4954 (0.0123) 


SVD++ 

single- 

resource | 0.4921 (0.0147) 0.4921 (0.0147) | 0.4921 (0.0147) | 0.4409 (0.0059) 0.4409 (0.0059) 0.4409 (0.0059) 
SVD++ 

beccling | 0-496 (0.0024) 0.4972 (0.0036) | 0.4957 (0.0014) | 0.4724 (0.0056) 0.4716 (0.0047) 0.4723 (0.0072) 


Table 5: RMSE for student performance prediction task on Canvas Network datasets, using discussions, 
quiz-assignments, and assignments as auxiliary resources. 


quiz-assignments discussions assignments + discussions —+ 
— assignments — assignments quiz-assignments quiz-assignments 
ane ead 0.1073 (0.0209) | 0.1093 (0.0163) | 0.0911 (0.0124) | 0.1207 (0.0109) 
Business and aired-resource 
Management ene 0.1871 (0.0143) 0.1569 (0.0115) 0.1696 (0.0111) 0.1903 (0.0085) 
Sal 0.1890 (0.0208) 0.1890 (0.0208) 0.1532 (0.0125) 0.1532 (0.0125) 
Se 0.1741 (0.0182) 0.1741 (0.0182) 0.1752 (0.0118) 0.1752 (0.0118) 
baseline 
oo 0.1264 (0.0085) | 0.1252 (0.0049) | 0.1252 (0.0035) | 0.1287 (0.0105) 
Professions and mired-FEROUrOS 
Applied Sciences ea 0.2070 (0.0112) 0.1897 (0.0140) 0.2039 (0.0211) 0.3254 (0.0171) 
aia 0.5235 (0.0196) | 0.5235 (0.01960) | 0.2057 (0.0176) 0.2057 (0.0176) 
average 
2 0.4596 (0.0019) 0.4596 (0.0019) 0.3838 (0.0037) 0.3838 (0.0037) 
baseline 


sparse compared to all other datasets. For Canvas Network 
datasets we run four sets of experiments. In the first two 
sets, we use quiz-assignments and discussions as auxiliary 
resources to predict students’ performance in assignments. 
In the third and fourth sets of experiments we predict stu- 
dents’ grade in quiz-assignments using general assignments 
and discussions as auxiliary resources. 


Table {5} shows RMSE of all approaches on both “Profes- 
sions and Applied Sciences” and “Business and Management” 
datasets. Similar to our results on the Mastery Grids dataset, 
we can see that the proposed approach can effectively use 
auxiliary resources to provide better estimation of student 
performance in all resource pairs. Comparing paired-resource 
SVD++4 to single-resource SVD++, we can see that in most 
of the experiments their error is not significantly different. 
Only for “quiz-assignments — assignments” and “discussions 


— assignments”, in “Professions and Applied Sciences” dataset, 


paired-resource SVD+-+ is significantly better than single- 
resource SVD++. Comparing the average baseline results, 
it’s error is significantly higher than (in “Professions and Ap- 
plied Sciences” dataset) or similar to paired-resource SVD++. 
Whereas compared to single-resource SVD+4, it works bet- 


ter in predicting assignments, and worse in predicting quiz- 
assignments. This is because there is more variation in stu- 
dents’ scores in quiz-assignments. 


In addition to the way different courses are designed and 
learning resources are prepared, one of the reasons behind 
the different results between the two datasets can be due 
to the variations between two course datasets. For exam- 
ple, having more students and being sparser may lead to 
added value of auxiliary information in the “Professions and 
Applied Sciences” dataset (Table |2). In other words, effec- 
tiveness of adding auxiliary data for the task of performance 
prediction depends on the dataset and its characteristics. 


5. CONCLUSIONS 


We proposed an approach inspired by canonical correlation 
analysis for discovering interrelationships between learning 
resources of different types, only using student performance 
in them. This approach can also be used to predict students’ 
performance. That is to say, we can predict students’ per- 
formance in one type of learning resources, with the help of 
student activities in another resource type. We evaluated 
the proposed approach with four datasets and two tasks. 


Proceedings of the 11th International Conference on Educational Data Mining 95 


For the task of finding learning resource interrelationships, 
we evaluated our approach on the Java programming dataset 
with three resource types, and the Python programming 
dataset with four resource types. Finding the most simi- 
lar resources of different types, only based on student ac- 
tivities, we showed that our approach is very promising in 
detecting these similarities, especially for learning resources 
that have been proved to have a positive effect on students’ 
learning. Also, we found that our approach goes beyond the 
designated topics for learning resources and discovers latent 
similarities that provide clues of their content similarity. 


Having four datasets from two online learning systems, we 
ended up with 16 total experiment sets for predicting stu- 
dent performance in paired resource types. We compared 
our proposed approach with an average baseline and two al- 
gorithmic baselines: one using student activities in both aux- 
iliary and target resource types (paired resource SVD++), 
and one with using student activities in only target resource 
type (single resource SVD++). The experiments showed 
that our proposed approach can significantly improve esti- 
mation of student grades in all setups and datasets. This 
success is in part due to the extra information from the aux- 
iliary resource types on students’ performance: in three out 
of 16 setups, the baseline algorithm with auxiliary data per- 
formed better than the baseline algorithm without auxiliary 
data . However, in two of the setups the baseline with aux- 
iliary data performed significantly worse than the baseline 
without it. Meanwhile, the proposed approach performed 
better than both baselines in all of the 16 experiments. It 
showed that better performance of the proposed approach is 
not only because of having extra information, but also be- 
cause of its ability to use latent interrelationships between 
auxiliary and target resource types, in a more efficient way. 
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